Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning

نویسندگان

  • Ishai Menache
  • Shie Mannor
  • Nahum Shimkin
چکیده

We present the Q-Cut algorithm, a graph theoretic approach for automatic detection of sub-goals in a dynamic environment, which is used for acceleration of the Q-Learning algorithm. The learning agent creates an on-line map of the process history, and uses an efficient MaxFlow/Min-Cut algorithm for identifying bottlenecks. The policies for reaching bottlenecks are separately learned and added to the model in a form of options (macro-actions). We then extend the basic Q-Cut algorithm to the Segmented Q-Cut algorithm, which uses previously identified bottlenecks for state space partitioning, necessary for finding additional bottlenecks in complex environments. Experiments show significant performance improvements, particulary in the initial learning phase.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mini/Micro-Grid Adaptive Voltage and Frequency Stability Enhancement Using Q-learning Mechanism

This paper develops an adaptive control method for controlling frequency and voltage of an islanded mini/micro grid (M/µG) using reinforcement learning method. Reinforcement learning (RL) is one of the branches of the machine learning, which is the main solution method of Markov decision process (MDPs). Among the several solution methods of RL, the Q-learning method is used for solving RL in th...

متن کامل

Fixed vs Dynamic Sub-transfer in Reinforcement Learning Technical report

We survey various transfer methods in Q-learning, a type of reinforcement learning, and present a variation on fixed sub-transfer which we call dynamic sub-transfer. We describe the pros and cons of dynamic sub-transfer as compared with the other transfer methods, and we describe qualitatively the situations where this method would be preferred over the fixed version of sub-transfer.

متن کامل

Learning to Achieve Goals

Temporal diierence methods solve the temporal credit assignment problem for reinforcement learning. An important subproblem of general reinforcement learning is learning to achieve dynamic goals. Although existing temporal diierence methods, such as Q learning, can be applied to this problem, they do not take advantage of its special structure. This paper presents the DG-learning algorithm, whi...

متن کامل

Multicast Routing in Wireless Sensor Networks: A Distributed Reinforcement Learning Approach

Wireless Sensor Networks (WSNs) are consist of independent distributed sensors with storing, processing, sensing and communication capabilities to monitor physical or environmental conditions. There are number of challenges in WSNs because of limitation of battery power, communications, computation and storage space. In the recent years, computational intelligence approaches such as evolutionar...

متن کامل

User-based Vehicle Route Guidance in Urban Networks Based on Intelligent Multi Agents Systems and the ANT-Q Algorithm

Guiding vehicles to their destination under dynamic traffic conditions is an important topic in the field of Intelligent Transportation Systems (ITS). Nowadays, many complex systems can be controlled by using multi agent systems. Adaptation with the current condition is an important feature of the agents. In this research, formulation of dynamic guidance for vehicles has been investigated based...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002